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Abstract 

In the last two decades, two related categories of problems have been studied independently in the 
image restoration literature: super-resolution and demosaicing. A closer look at these problems reveals the 
relation between them, and as conventional color digital cameras suffer from both low-spatial resolution 
and color-filtering, it is reasonable to address them in a unified context. In this paper, we propose a fast 
and robust hybrid method of super-resolution and demosaicing, based on a MAP estimation technique 
by minimizing a multi-term cost function. The Li norm is used for measuring the difference between 
the projected estimate of the high-resolution image and each low-resolution image, removing outliers 
in the data and errors due to possibly inaccurate motion estimation. Bilateral regularization is used 
for spatially regularizing the luminance component, resulting in sharp edges and forcing interpolation 
along the edges and not across them. Simultaneously, Tikhonov regularization is used to smooth the 
chrominance components. Finally, an additional regularization term is used to force similar edge location 
and orientation in different color channels. We show that the minimization of the total cost function is 
relatively easy and fast. Experimental results on synthetic and real data sets confirm the effectiveness of 
our method. 
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I. Introduction 

Several distorting processes affect the quality of images acquired by commercial digital cameras. Some 
of the more important distorting effects include warping, blurring, color-filtering, and additive noise. A 
common image formation model for such imaging systems is illustrated in Figure' 1. In this model, a 
real-world scene is seen to be warped at the camera lens because of the relative motion between the 
scene and camera. The imperfections of the optical lens results in the blurring of this warped image 
which is then sub-sampled and color-filtered at the CCD. The additive readout noise at the CCD will 
further degrade the quality of captured images. 

There is a growing interest in the multi-frame image reconstruction algorithms that compensate for the 
shortcomings of the imaging system. Such methods can achieve high-quality images using less expensive 
imaging chips and optical components by capturing multiple images and fusing them. 

In digital photography, two image reconstruction problems have been studied and solved independently 
- super-resolution (SR) and demosaicing. The former refers to the limited number of pixels and the desire 
to go beyond this limit using several exposures. The latter refers to the color-filtering applied on a single 
CCD array of sensors on most cameras, that measures a subset of R (red), G (green), and B (blue) 
values, instead of a full RGB field^. It is natural to consider these problems in a joint setting because 
both refer to resolution limitations at the camera. Also, since the measured images are mosaiced, solving 
the super-resolution problem using pre-processed (demosaiced) images is sub-optimal and hence inferior 
to a single unifying solution framework. In this paper we propose a fast and robust method for joint 
multi-frame demosaicing and color super-resolution. 

The organization of this paper is as follows. In Section II we review the super-resolution and demo¬ 
saicing problems and the inefficiency of independent solutions for them. In Section III we formulate and 
analyze a general model for imaging systems applicable to various scenarios of multi-frame image recon¬ 
struction. We also formulate and review the basics of the maximum a posteriori (MAP) estimator, robust 

'This paper (with all color pictures and a MATLAB-based software package for resolution enhancement) is available at 
http://www. ee. ucsc. edii/^milanfar . 

^Three CCD cameras which measure each color field independently tend to be relatively more expensive. 
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Fig. 1. Block diagram representing the image formation model considered in this paper, where X is the intensity distribution 
of the scene, V is the additive noise, and Y is the resulting color-filtered low-quality image. The operators F, H, D, and A 
are representatives of the warping, blurring, down-sampling, and color-filtering processes, respectively. 
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data fusion, and regularization methods. Armed with material developed in earlier sections, in Section 
IV we present and formulate our joint multi-frame demosaicing and color-super-resolution method. In 
Section V we review two related methods of multi-frame demosaicing. Simulations on both synthetic 
and real data sequences are given in Section VI and concluding remarks are drawn in Section VII. 

II. AN OVERVIEW OF SUPER-RESOLUTION AND DEMOSAICING PROBLEMS 

In this section, we study and review some of the previous work on super-resolution and demosaicing 
problems. We show the inefficiency of independent solutions for these problems and discuss the obstacles 
to designing a unified approach for addressing these two common shortcomings of digital cameras. 

A. Super-Resolution 

Digital cameras have a limited spatial resolution, dictated by their utilized optical lens and CCD array. 
Surpassing this limit can be achieved by acquiring and fusing several low-resolution (LR) images of 
the same scene, producing high-resolution (HR) images; this is the basic idea behind super-resolution 
techniques [1], [2], [3], [4]. 

In the last two decades a variety of super-resolution methods have been proposed for estimating the 
HR image from a set of LR images. Early works on SR showed that the aliasing effects in the LR images 
enable the recovery of the high-resolution (HR) fused image, provided that a relative sub-pixel motion 
exists between the under-sampled input images [5]. However, in contrast to the clean and practically naive 
frequency domain description of SR in that early work, in general SR is a computationally complex and 
numerically ill-behaved problem in many instances [6]. In recent years more sophisticated SR methods 
were developed (See [3], [6], [7], [8], [9], [10] as representative works). 

Note that almost all super-resolution methods to date have been designed to increase the resolution 
of a single channel (monochromatic) image. A related problem, color SR, addresses fusing a set of 
previously demosaiced color LR frames to enhance their spatial resolution. To date, there is very little 
work addressing the problem of color SR. The typical solution involves applying monochromatic SR 
algorithms to each of the color channels independently [11], [12], while using the color information to 
improve the accuracy of motion estimation. Another approach is transforming the problem to a different 
color space, where chrominance layers are separated from luminance, and SR is applied only to the 
luminance channel [7]. Both of these methods are sub-optimal as they do not fully exploit the correlation 
across the color bands. 
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In Section VI we show that ignoring the relation between different color channels will result in color 
artifacts in the super-resolved images. Moreover, as we will advocate later in this paper, even a proper 
treatment of the relation between the color layers is not sufficient for removing color artifacts if the 
measured images are mosaiced. This brings us to the description of the demosaicing problem. 

B. Demosaicing 

A color image is typically represented by combining three separate monochromatic images. Ideally, 
each pixel reflects three data measurements; one for each of the color bands In practice, to reduce 
production cost, many digital cameras have only one color measurement (red, green, or blue) per pixel 

The detector array is a grid of CCDs, each made sensitive to one color by placing a color-filter 
array (CFA) in front of the CCD. The Bayer pattern shown on the left hand side of Figure 3 is a very 
common example of such a color-filter. The values of the missing color bands at every pixel are often 
synthesized using some form of interpolation from neighboring pixel values. This process is known as 
color demosaicing. 

Numerous demosaicing methods have been proposed through the years to solve this under-determined 
problem, and in this section we review some of the more popular ones. Of course, one can estimate the 
unknown pixel values by linear interpolation of the known ones in each color band independently. This 
approach will ignore some important information about the correlation between the color bands and will 
result in serious color artifacts. Note that the Red and Blue channels are down-sampled two times more 
than the Green channel. It is reasonable to assume that the independent interpolation of the Green band 
will result in a more reliable reconstruction than the Red or Blue bands. This property, combined with 
the assumption that the and ratios are similar for the neighboring pixels, make the basics 

of the smooth hue transition method first discussed in [13]. 

Note that there is a negligible correlation between the values of neighboring pixels located on the 
different sides of an edge. Therefore, although the smooth hue transition assumption is logical for smooth 
regions of the reconstructed image, it is not successful in the high-frequency (edge) areas. Considering 
this fact, gradient-based methods, first addressed in [14], do not preform interpolation across the edges of 
an image. This non-iterative method uses the second derivative of the Red and Blue channels to estimate 
the edge direction in the Green channel. Later, the Green channel is used to compute the missing values 
in the Red and Blue channels. 

^This is the scenario for the more expensive 3-CCD cameras. 

^This is the scenario for cheaper 1-CCD cameras. 
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A variation of this method was later proposed in [15], where the second derivative of the Green 
channel and the first derivative of the Red (or Blue) channels are used to estimate the edge direction 
in the Green channel. The smooth hue and gradient based methods were later combined in [44]. In this 
iterative method, the smooth hue interpolation is done with respect to the local gradients computed in 
eight directions about a pixel of interest. A second stage using anisotropic inverse diffusion will further 
enhance the quality of the reconstructed image. This two step approach of interpolation followed by an 
enhancement step has been used in many other publications. In [16], spatial and spectral correlations 
among neighboring pixels are exploited to define the interpolation step, while adaptive median filtering 
is used as the enhancement step. A different iterative implementation of the median filters is used as the 
enhancement step of the method described in [17], that take advantage of a homogeneity assumption in 
the neighboring pixels. 

Iterative MAP methods form another important category of demosaicing methods. A MAP algorithm 
with a smooth chrominance prior is discussed in [18]. The smooth chrominance prior is also used in 
[19], where the original image is transformed to YIQ representation. The chrominance interpolation is 
preformed using isotropic smoothing. The luminance interpolation is done using edge directions computed 
in a steerable wavelet pyramidal structure. 

Other examples of popular demosaicing methods available in published literature are [20], [21], [22], 
[23], [24], [25], and [26]. Almost all of the proposed demosaicing methods are based on one or more of 
these following assumptions: 

1) In the constructed image with the mosaicing pattern, there are more green sensors with regular 
pattern of distribution than blue or red ones (in the case of Bayer CFA there are twice as many 
greens than red or blue pixels and each is surrounded by 4 green pixels). 

2) Most algorithms assume a Bayer CFA pattern, for which each red, green and blue pixel is a neighbor 
to pixels of different color bands. 

3) For each pixel one and only one color band value is available. 

4) The pattern of pixels does not change through the image. 

5) The human eye is more sensitive to the details in the luminance component of the image than the 
details in chrominance component [19]. 

6) The human eye is more sensitive to chromatic changes in the low spatial frequency region than the 
luminance change [23]. 

7) Interpolation should be preformed along and not across the edges. 

8) Different color bands are correlated with each other. 
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9) Edges should align between color channels. 

Note that even the most popular and sophisticated demosaicing methods will fail to produce satisfactory 
results when severe aliasing is present in the color-filtered image. Such severe aliasing happens in cheap 
commercial still or video digital cameras, with small number of CCD pixels. The color artifacts worsen 
as the number of CCD pixels decreases. The following example shows this effect. 

Figure 2.a shows a HR image captured by a 3-CCD camera. If for capturing this image, instead of a 
3-CCD camera a 1-CCD camera with the same number of CCD pixels was used, the inevitable mosaicing 
process will result in color artifacts. Figure 2.d shows the result of applying demosaicing method of [44] 
with some negligible color-artifacts on the edges. 

Note that many commercial digital video cameras can only be used in lower spatial resolution modes 
while working in higher frame rates. Figure 2.b shows a same scene from a 3-CCD camera with a down- 
sampling factor of 4 and Figure 2.e shows the demosaiced image of it after color-filtering. Note that the 
color artifacts in this image are much more evident than 2.d. These color artifacts may be reduced by 
low-pass filtering the input data before color-filtering. Figure 2.c shows a factor of four down-sampled 
version of 2.a, which is blurred with a symmetric Gaussian low-pass filter of size 4x4 with standard 
deviation equal to one, before down-sampling. The demosaiced image shown in 2.f has less color artifacts 
than 2.e, however it has lost some high-frequency details. 

The poor quality of single-frame demosaiced images stimulates us to search for multi-frame demo¬ 
saicing methods, where the information of several low-quality images are fused together to produce 
high-quality demosaiced images. 

C. Merging super-resolution and demosaicing into one process 

Referring to the mosaic effects, the geometry of the single-frame and multi-frame demosaicing problems 
are fundamentally different, making it impossible to simply cross apply traditional demosaicing algorithms 
to the multi-frame situation. To better understand the multi-frame demosaicing problem, we offer an 
example for the case of translational motion. Suppose that a set of color-filtered FR images is available 
(images on the left in Figure 3). We use the two step process explained in Section IV to fuse these images. 
The Shift-And-Add image on the right side of Figure 3 illustrates the pattern of sensor measurements in 
the HR image grid. In such situations, the sampling pattern is quite arbitrary depending on the relative 
motion of the FR images. This necessitates different demosaicing algorithms than those designed for the 
original Bayer pattern. 



a: Original 


d: Demosaiced (a) 


b: Down-sampled c: Blurred and down-sampled 



e: Demosaiced (b) f: Demosaiced (c) 






Fig. 2. A HR image (a) captured by a 3-CCD camera is down-sampled by a factor of four (b). In (c) the image in (a) is 
blurred by a Gaussian kernel before down-sampling by a factor of 4. The images in (a), (b), and (c) are color-filtered and then 
demosaiced by the method of [44], The results are shown in (d), (e), (f), respectively. 


Figure 3 shows that treating tbe green channel differently than the red or blue channels, as done in 
many single-frame demosaicing methods before, is not useful for the multi-frame case. While globally 
there are more green pixels than blue or red pixels, locally, any pixel may be surrounded by only red or 
blue colors. So, there is no general preference for one color band over the others (the first and second 
assumptions in Section II-B are not true for the multi-frame case). 

Another assumption, the availability of one and only one color band value for each pixel, is also not 
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Fig. 3. Fusion of 7 Bayer pattern LR images with relative translational motion (the figures in the left side of the accolade) 
results in a FIR image (Z) that does not follow Bayer pattern (the figure in the right side of the accolade). The symbol “?” 
represents the FIR pixel values that were undetermined (as a result of insufficient LR frames) after the Shift-And-Add step 
(Shift-And-Add method is extensively discussed in [3], and briefly reviewed in III-F). 


correct in the multi-frame case. In the under-determined cases ^, there are not enough measurements to fill 
the HR grid. The symbol “?” in Figure 3 represents such pixels. On the other hand, in the over-determined 
cases^, for some pixels, there may in fact he more than one color value available. 

The fourth assumption in the existing demosaicing literature described earlier is not true because the 
field of view (FOV) of real world LR images changes from one frame to the other, so the center and the 
border patterns of red, green, and blue pixels differ in the resulting HR image. 

HI. MATHEMATICAL MODEL AND SOLUTION OUTLINE 
A. Mathematical Model of the Imaging System 

Figure 1 illustrates the image degradation model that we consider. We represent this approximated 
forward model by the following equation: 

Yfk) = Di{k)H{k)F{k)Xi + Vi{k) = Ti{k)Xi + V,{k) k = l,...,N 

^ where fhe number of non-redundant LR frames is smaller than the square of resolution enhancement factor. A resolution 
enhancement factor of r means that LR images of dimension M x M produce a HR output of dimension rM x rM. 

®where the number of non-redundant LR frames is larger than the square of resolution enhancement factor. 
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( 2 ) 


The vectors Xj and Y_i{k) are representing the hand (R, G, or B) of the HR color frame and the 
LR frame after lexicographic ordering, respectively. Matrix F{k) is the geometric motion operator 
between the HR and LR frames. The camera’s point spread function (PSF) is modelled hy the hlur matrix 
H{k). The matrix Di{k) represents the down-sampling operator, which includes both the color-filtering 
and CCD down-sampling operations^. Geometric motion, blur, and down-sampling operators are covered 
by the operator Ti{k), which we call the system matrix. The vector V_^{k) is the system noise and N is 
the number of available LR frames. 

The HR color image {X} is of size [12r^M^ x 1]), where r is the resolution enhancement factor. The 
size of the vectors V_Q{k) and Y^ik) is [2M^ x 1] and vectors V_ji{k), ^ji{k), Y_B{k), and YiQ{k) are of 
size [M^ X 1]. The geometric motion and blur matrices are of size [4r^M^ x 4r^M^]. The down-sampling 
and system matrices are of size [2M^ x 4r^M^] for the Green band and of size [M^ x 4r^M^] for the 
Red and Blue bands 

Considered separately, super-resolution and demosaicing models are special cases of the general model 
presented above. In particular, in the super-resolution literature the effect of color-filtering is usually 
ignored [9], [10], [3] and therefore the model is simplified to: 


Y{k) = D{k)H{k)F{k)X + V{k) 


k = 1,...,A^ 


( 3 ) 


In this model the LR images Y_{k) and the HR image X_ are assumed to be monochromatic. On the 
other hand, in the demosaicing literature only single frame reconstruction of color images is considered. 


’it is convenient to think of Di{k) = Ai{k)D{k), where D{k) models the incoherent down-sampling effect of the CCD and 
Aiik) models the color-filter effect [27]. 

*Note that color super-resolution by itself is a special case of this model, where vectors and are of size [4M^ x 1] 

and matrices Ti(k) and Di(k) are of size [4M^ x 4r^M^] for any color band. 
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resulting in a simplified model: 

y^ = D,Xi + Vi i = R,G,B . (4) 

As such, the classical approach to the multi-frame reconstruction of color images has been a two-step 
process. The first step is to solve (4) for each image (demosaicing step) and the second step is to use 
the model in (3) to fuse the LR images resulting from the first step, reconstructing the color HR image 
(usually each R, G , or B hands is processed individually). Of course, this two step method is a suhoptimal 
approach to solving the overall problem. In Section IV, we propose a Maximum A-Posteriori (MAP) 
estimation approach to directly solve (1). 

B. MAP Approach to Multi-Frame Image Reconstruction 

Following the forward model of (1), the problem of interest is an inverse problem, wherein the source 
of information (HR image) is estimated from the observed data (LR images). An inherent difficulty with 
inverse problems is the challenge of inverting the forward model without amplifying the effect of noise 
in the measured data. In many real scenarios, the problem is worsened by the fact that the system matrix 
T is singular or ill-conditioned. Thus, for the problem of super-resolution, some form of regularization 
must be included in the cost function to stabilize the problem or constrain the space of solutions. 

From a statistical perspective, regularization is incorporated as a priori knowledge about the solution. 
Thus, using the Maximum A-Posteriori (MAP) estimator, a rich class of regularization functions emerges, 
enabling us to capture the specifics of a particular application. This can be accomplished by way of 
Lagrangian type penalty terms as in 

X = ArgMin [p(Y, TX) + Ar(X)] , (5) 

where p, the data fidelity term, measures the “distance” between the model and measurements, and F is 
the regularization cost function, which imposes a penalty on the unknown X to direct it to a better formed 
solution. The regularization parameter. A, is a scalar for properly weighting the first term (data fidelity 
cost) against the second term (regularization cost). Generally speaking, choosing A could be either done 
manually, using visual inspection, or automatically using methods like Generalized Cross-Validation [28], 
[29], L-curve [30], or other techniques. How to choose such regularization parameters is in itself a vast 
topic, which we will not treat in the present paper. 
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C. Monochromatic Spatial Regularization 

Tikhonov regularization, of the form r(X) = ||AX|| 2 , is a widely employed form of regularization [9], 
[6], where A is a matrix capturing some aspects of the image such as its general smoothness. Tikhonov 
regularization penalizes energy in the higher frequencies of the solution, opting for a smooth and hence 
blurry image. 

To achieve reconstructed images with sharper edges, in the spirit of the total variation criterion [31], 
[32] and a related method called the bilateral filter ^ [33], [34], a robust regularizer called Bilateral-TV 
(B-TV) was introduced in [3]. The B-TV regularizing function looks like: 

p p 

r(Z) = ^ ^ aH+KI||X-5^5^X||i , (6) 

l=—P m=—P 

where and S'™ are the operators corresponding to shifting the image represented by X by f pixels in 
horizontal direction and m pixels in vertical direction, respectively. This cost function in effect computes 
derivatives across multiple scales of resolution (as determined by the parameter P). The scalar weight 
0 < a < 1 is applied to give a spatially decaying effect to the summation of the regularization term. The 
parameter “P” defines fhe size of fhe corresponding Bilaferal filter kernel. The Bilateral filter and its 
parameters are extensively discussed in [33], [34], and [3]. 

The performance of B-TV and Tikhonov priors are thoroughly studied in [3]. The B-TV regularization 
is used in Section IV to help reconstruct the luminance component of the demosaiced images. Note that, 
these two regularization terms in the presented form do not consider the correlation of different color 
bands. 

D. Color Regularization 

To reduce color artifacts, a few MAP based demosaicing algorithms have adapted regularization 
terms for color channels. Typically, the color regularization priors are either applied on the chrominance 
component of an image (after transforming to a suitable color space such as YIQ representation [19]), 
or directly on the RGB bands [18]. While the former can be easily implemented by some isotropic 
smoothing priors such as Tikhonov regularization, the latter is computationally more complicated. 

Note that, although different bands may have larger or smaller gradient magnitudes at a particular edge, 
it is reasonable to assume the same edge orientation and location for all color channels. That is to say, 

®Note that by adopting a different realization of the bilateral filter, [26] has proposed a successful single frame demosaicing 


method. 
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if an edge appears in the red band at a particular location and orientation, then an edge with the same 
location and orientation should appear in the other color bands. Therefore, a cost function that penalizes 
the difference in edge location and/or orientation of different color bands incorporates the correlation 
between different color bands prior. We will employ such a cost function in Section IV to remove color 
artifacts. 

Following [18], minimizing the vector product norm of any two adjacent color pixels forces different 
bands to have similar edge location and orientation. The vector (outer) product of M : [mr,mg,mh\'^ 
and ^ : [n^, n^, n?,]^, which represent the color values of two adjacent pixels, is defined as: 

II M X iV |||= [|M||A^|sm(0)]^ = \\i{mgni, — ?Ti;,ng)||| + \\j{mi,nr — mrnb)\\2 + \\k{mrng — mgnr)\\2 

where 0 is the angle between these two vectors. As the data fidelity penalty term will restrict the values 
of |M| and |A^|, minimization of ||M x N \\2 will minimize sm(0), and consequently the 0 itself, where 
a small value of 0 is an indicator of similar orientation. 

E. Data Fidelity 

One of the most common cost functions to measure the closeness of the final solution to the measured 
data is the least-squares (LS) cost function, which minimizes the L 2 norm of the residual vector, 

p{Y,TX) = \\Y-TX\\l , (7) 

(See [9], [10], [35] as representative works). For the case where the noise V is additive white, zero mean 
Gaussian, this approach has the interpretation of providing the Maximum Likelihood (ML) estimate of 
X [9]. However, a statistical study of the noise properties found in many real image sequences used for 
multi-frame image fusion techniques, suggests that heavy-tailed noise distributions such as Laplacian are 
more appropriate models (especially in the presence of the inevitable motion estimation error) [36]. In 
[3], an alternate data fidelity term based on the Li norm is recently used, which has been shown to be 
very robust to data outliers: 

p{y,TX) = \\Y-TX\\i ■ ( 8 ) 

Note that the Li norm is the ML estimate of data in the presence of Laplacian noise. The performance 
of the Li and L 2 norms is compared and discussed in [3]. The performance of the Li and L 2 norms 
is compared and discussed in [3]. In this paper (Section IV), we have adopted the Li norm (which is 
known to be more robust than L 2 ) as the data fidelity measure. 
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F. Speed-Ups for the Special Case of Translation Motion and Common Space-Invariant Blur 

Considering translational motion model and common^® space-invariant PSF, the operators H and F{k) 
are commutative {F{k)H = HF{k)). We can rewrite (1) as 

^i{k) = Di{k)F{k)HX+^i{k) k = l,...,N i = R,G,B , (9) 

By substituting Z = HX. the inverse problem may be separated into the much simpler sub-tasks of: 

1) Fusing the available images and estimating a blurred high-resolution image from the low-resolution 
measurements (we call this result Z). 

2) Estimating the deblurred image X from Z. 

The optimality of this method is extensively discussed in [3], where it is shown that Z is the weighted 
mean (mean or median operators, for the cases of L 2 norm and Li norm, respectively) of all measurements 
at a given pixel, after proper zero filling and motion compensation. We call this operation Shift-And-Add, 
which greatly speeds-up the task of multi-frame image fusion under the assumptions made. To compute 
the Shift-And-Add image, first the relative motion between all LR frames is computed. Then, a set of 
HR images is constructed by up-sampling each LR frame by zero filling. Then, these HR frames are 
registered with respect to the relative motion of the corresponding LR frames. A pixel-wise mean or 
median operation on the non-zero values of these HR frames will result in the Shift-And-Add image. 

In the next section, we use the penalty terms described in this section to formulate our proposed method 
of multi-frame demosaicing and color super-resolution. 

IV. Multi-Lrame Demosaicing 

In Section II-C we indicated how the multi-frame demosaicing is fundamentally different than single¬ 
frame demosaicing. In this section, we propose a computationally efficient MAP estimation method to 
fuse and demosaic a set of LR frames (which may have been color-filtered by any CLA) resulting in 
a color image with higher spatial resolution and reduced color artifacts. Our MAP based cost function 
consists of the following terms, briefly mofivated in the previous section: 

1) A penalty term to enforce similarities between the raw data and the HR estimate (Data Lidelity 
Penalty Term). 

2) A penalty term to encourage sharp edges in the luminance component of the HR image (Spatial 
Luminance Penalty Term). 

H{k) = H, which is true when all images are acquired with the same camera. 
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3) A penalty term to encourage smoothness in the chrominance component of the HR image (Spatial 
Chrominance Penalty Term). 

4) A penalty term to encourage homogeneity of the edge location and orientation in different color 
hands (Inter-Color Dependencies Penalty Term). 

Each of these penalty terms will he discussed in more detail in the following subsections. 

A. Data Fidelity Penalty Term 

This term measures the similarity between the resulting HR image and the original LR images. As it is 
explained in Section III-E and [3], Li norm minimization of the error term results in robust reconstruction 
of the HR image in the presence of uncertainties such as motion error. Considering the general motion 
and blur model of (1), the data fidelity penalty term is defined as: 

N 

ux)= Y. ( 10 ) 

i=R,G,B k=l 

Nofe fhaf fhe above penalfy function is applicable for general models of dafa, blur and motion. However, 
in fhis paper we only freaf fhe simpler case of common space invarianf PSE and translafional motion. 
This could, for example, correspond fo a vibrafing camera acquiring a sequence of images from a sialic 
scene. 

Eor fhis purpose, we use fhe Iwo step mefhod of Secfion III-E fo represenf fhe dafa fidelily penalfy term, 
which is easier fo inlerprel and has a faster implemenlalion polenlial [3]. This simplified dafa fidelily 
penally ferm is defined as: 

M2L)= E 111 > (11) 

i=R,G,B 

where Zg., and are fhe fhree color channels of fhe color Shifl-And-Add image, Z. The matrix 
(i = R, G, B), is a diagonal matrix with diagonal values equal to the square root of the number of 
measurements that contributed to make each element of Zj (in the square case is the identity matrix). 
So, the undefined pixels of Z^ have no effecl on fhe HR eslimale. On fhe olher hand, Ihose pixels of 
Z_Q which have been produced from numerous measuremenls, have a sfronger effecl in fhe eslimalion 
of fhe HR frame. The g,b} malrices for fhe multi-frame demosaicing problem are sparser lhan 

fhe corresponding malrices in fhe color SR case. The veclors jf p. X_q, and Xp are fhe fhree color 
componenls of fhe reconstructed HR image X. 
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B. Spatial Luminance Penalty Term 

The human eye is more sensitive to the details in the luminance component of an image than the 
details in the chrominance components [19]. Therefore, it is important that the edges in the luminance 
component of the reconstructed HR image look sharp. As explained in Section III-C, applying B-TV 
regularization to the luminance component will result in this desired property [3]. The luminance image 
can he calculated as the weighted sum X i = 0.2992^ji + 0.597A(^ + 0.114X^ as explained in [37]. 
The luminance regularization term is then defined as: 

Ji(Z)=E E (12) 

l=-Pm=-P 

C. Spatial Chrominance Penalty Term 

Spatial regularization is required also for the chrominance layers. However, since the HVS is less 
sensitive to the resolution of these hands, we can use a simpler regularization, based on the L 2 norm [3]: 

J2{X) = \\KXc4l + \\KXc2\\l. ( 13 ) 

where the images X_ci and X_C2 I Q layers in the YIQ color representation * ^. 


D. Inter-Color Dependencies Penalty Term 


This term penalizes the mismatch between locations or orientations of edges across the color bands. As 
described in Section III-D, the authors of [18] suggest a pixelwise inter-color dependencies cost function 
to be minimized. This term has the vector outer product norm of all pairs of neighboring pixels, which 
is solved by the finite element method. With some modifications to what was proposed in [18], our 

inter-color dependencies penalty term is a differentiable cost function: 

1 1 


UX) = E E \\Xx,QSis^XB-XBQSis^]L 


G\\2 


+ 


l=—l m=—l 


\Xb O 5-Y 


■B 


I 2 


\X^ 0 5-Xg - Yg 0 


(14) 


where 0 is the element by element multiplication operator. 


"The Y layer (Yj) is treated in (12). 
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E. Overall Cost Function 


The overall cost function is the summation of the cost functions descrihed in the previous subsections: 


l = ArgMin[Jo(X) + A'Ji(X) + A"J 2 (X) + A'"J 3 (X)] . (15) 

X 

Steepest descent optimization may he applied to minimize this cost function. In the first step, the derivative 
of (15) with respect to one of the color hands is calculated, assuming the other two color hands are fixed. 
In fhe next steps, the derivative will he computed with respect to the other color channels. For example 
the derivative with respect to the Green hand (Xg) is calculated as follows: 


vIg 


^^sign{^GH2LG- + 

p p 

X’ Y. ^ 0-5870 X [/ - S-^S-^]sign (o.2989(X^ - SlS^Xl)+ 


l=—P m=—P 


0.5870(Xg - SiS^X^) + 0.1140(XS - SiS^J^Xp)) + 

1 1 

A 2^ 2^ 2(Xb - by XBjiXg 2Lg - ^Bb^by Xg) + 


l=—l m=—l 
^l,m Q-l n—mv WYh*"' 


r)/-vJiiTi Q—lo—mv “V C‘ om V 'i 

2(JVr — b^ b JVR,j(ZVR Ag “ 


+ 


A A^A(-0.1536 X Xr + 0.2851 x Xg- - 0.1316 x Xr) 


(16) 


where S~^ and Sy^ define fhe fransposes of matrices and 5™, respectively, and have a shifting 
effect in the opposite directions of S^. and S^. The notation Xr, and Xb stands for the diagonal matrix 
representations of the Red and Blue hands and Xr"" and Xr™ are the diagonal representations of these 
matrices shifted hy I and m pixels in the horizontal and vertical directions, respectively. The calculation 
of the inter-color dependencies term derivative is explained in the Appendix I. 

Matrices H, A, D, S^, and S'™ and their transposes can he exactly interpreted as direct image 
operators such as hlur, high-pass filtering, masking, down-sampling, and shift. Noting and implementing 
the effects of these matrices as a sequence of operators on the images directly spares us from explicitly 
constructing them as matrices. This property helps our method to he implemented in a fast and memory 
efficient way. 

The gradient of the other channels will he computed in the same way, and the following steepest 
(coordinate) descent iterations will he set up to calculate the HR image estimate iteratively. 

ir'=ir-/ 3 vxr i=R,G,B , d?) 


where the scalar f3 is the step size. 
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V. Related Methods 

As mentioned earlier, there has been very little work on the problem we have posed here. One related 
paper is the work of Zomet and Peleg [38], who have recently proposed a novel method for combining 
the information from multiple sensors, which can also be used for demosaicing purposes. Although their 
method has produced successful results for the single frame demosaicing problem, it is not specifically 
posed or directed towards solving the multi-frame demosaicing problem, and no multi-frame demosaicing 
case experiment is given. 

The method of [38] is based on the assumption of affine relation between the intensities of different 
sensors in a local neighborhood. To estimate the Red channel, first, affine relafions that project Green 
and Blue channels to the Red channel are computed. In the second stage, a super-resolution algorithm 
(e.g. the method of [7]) is applied on the available LR images in the Red channel (i.e. the original CFA 
data of the Red channel plus the projected Green and Blue channels) to estimate the HR Red channel 
image. A similar procedure estimates the HR Green and Blue channel images. As affine model is nol 
always valid for all sensors or image sets, so an affine model validity test is utilized in [38]. In the case 
that the affine model is not valid for some pixels, those projected pixels are simply ignored. 

The method of [38] is highly dependent on the validity of the affine model, which is not confirmed 
for fhe multi-frame case with inaccurate registration artifacts. Besides, the original CFA LR image of a 
channel and the less reliable projected LR images of other channels are equally weighted to construct 
the missing values, and this does not appear to be an optimal solution. 

In contrast to their method, our proposed technique exploits the correlation of the information in 
different channels explicitly to guarantee similar edge position and orientation in different color bands. 
Our proposed method also exploits the difference in sensitivity of the human eye to the frequency content 
and outliers in the luminance and chrominance components of the image. 

In parallel to our work, Gotoh and Okotumi [39] are proposing another MAP estimation method 
for solving the same joint demosaicing/super-resolution problem. While their algorithm and ours share 
much in common, there are fundamental differences between our algorithm and theirs in the robustness to 
model errors, and prior used. Model errors, such as choice of blur or motion estimation errors, are treated 
favorably by our algorithm due to the Li norm employed in the likelihood fidelity term. By contrast, in 
[39], an L 2 -norm data fusion term is used, which is not robust to such errors. In [3] it is shown how this 
difference in norm can become crucial in obtaining better results in the presence of model mismatches. 

As to the choice of prior, ours is built of several pieces, giving an overall edge preserved outcome. 
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smoothed chrominance layers, and forced edge and orientation alignment between color layers. To the 
contrary, [39] utilizes an unisotropic Tikhonov {L 2 norm) method of regularizing. 

VI. Experiments 

Experiments on synthetic and real data sets are presented in this section. In the first experiment, 
following the model of (1), we created a sequence of ER frames from an original HR image (Eigure 
4.a), which is a color image with full RGB values. Eirst we shifted this HR image hy one pixel in 
the vertical direction. Then to simulate the effect of camera PSE, each color hand of this shifted image 
was convolved with a symmetric Gaussian low-pass filter of size 5x5 with standard deviation equal to 
one. The resulting image was suhsampled hy the factor of 4 in each direction. The same process with 
different motion vectors (shifts) in vertical and horizontal directions was used to produce 10 ER images 
from the original scene. The horizontal shift between the low resolution images was varied between 0 to 
.75 pixels in the low-resolution grid (0 to 3 pixels in the high-resolution grid). The vertical shift between 
the low resolution images varied between 0 to .5 pixels in the low-resolution grid (0 to 2 pixels in the 
high-resolution grid). To simulate the errors in motion estimation, a bias equal to half a pixel shift in 
the ER grid was intentionally added to the known motion vector of one of the ER frames. We added 
Gaussian noise to the resulting ER frames to achieve SNR equal to 30dB. Then each ER color image 
was subsampled by the Bayer filter. 

In order to show what one of those measured images looks like, one of these Bayer filtered ER images 
is reconstructed by the method in [14] and shown in Eigure 4.b. The above method is implemented on 
Kodak DCS-200 digital cameras [40], so each ER image may be thought of as one picture taken with 
this camera brand. Eigure 4.c. shows the result of using the more sophisticated demosaicing method 
of [44]. 

As the motion model for this experiment is translational and the blur kernel is space invariant, we can 
use the fast model of (16) to reconstruct the blurry image (Z) on the HR grid. The Shift-And-Add result 
of the demosaiced ER frames after bilinear interpolation^'^, before deblurring and demosaicing is shown 
in Eigure 4.d. We used the result of the Shift-And-Add method as the initialization of the iterative multi¬ 
frame demosaicing methods. We used the original set of frames (raw data) to reconstruct a HR image 

'^Signal to noise ratio (SNR) is defined as lOlogj^Q where are variance of a clean frame and noise, respectively. 

*^We thank Prof. Ron Kimmel of the Technion for providing us with the code that implements the method in [44]. 

''^Interpolation is needed as this experiment is an under-determined problem, where some pixel values are missing. 
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with reduced color artifacts. Figures 5.a, 5.b, and 5.c show the effect of the individual implementation 
of each regularization term (luminance, chrominance, and inter-color dependencies), descrihed in Section 
IV. 

We applied the method of [44] to demosaic each of these 10 LR frames individually, and then applied 
the robust super-resolution method of [3] on each resulting color channel. The result of this method is 
shown in Figure 5.d. We also applied the robust super-resolution method of [3] on the raw (Bayer filtered) 
data (before demosaicing)^^. The result of this method is shown in Figure 6.a. To study the effectiveness 
of each regularization term, we paired (inter-color dependencies-luminance, inter-color dependencies- 
chrominance, and luminance-chrominance) regularization terms for which the results are shown in Figures 
6.b, 6.C, and 6.d ,respectively. Finally, Figure 7.a shows the result of the implementation of (15) with all 
terms. The parameters used for this example are as follows*®: (5 = 0.002, a = 0.9, A' = 0.01, A” = 150, 
A'" = 1. 

It is clear that the resulting image (Figure 7.a) has a better quality than the LR input frames or other 
reconstruction methods. Quantitative measurements confirm fhis visual comparison. We used PSNR 
and S-CIELAB measures fo compare fhe performance of each of fhese mefhods. Table I compares 
fhese values in which fhe proposed mefhod has fhe lowesf S-CIELAB error and fhe highesf PSNR values 
(and also fhe besf visual qualify specially in fhe red lifesaver secfion of fhe image). 

In fhe second experimenf, we used 30 compressed images capfured from a commercial webcam (PYRO- 
1394). Eigure 8.a shows one of fhese ER images (a selecfed region of fhis image is zoomed in Eigure 8.e 
for closer examinafion). Nofe fhaf fhe compression and color arfifacls are quife apparenf in fhese images. 

*^To apply the monochromatic SR method of [3] on this color-filtered sequence, we treated each color band separately. To 
consider the color-filtering operation, we substituted matrix A in Equation (23) of [3] with matrix <1? in (11). 

**The criteria for parameter selection in this example (and other examples discussed in this paper) was to choose parameters 
which produce visually most appealing results. Therefore to ensure fairness, each experiment was repeated several times with 
different parameters and the best result of each experiment was chosen as the outcome of each method. 

*’The PSNR of two vectors Y and Y of size [4r^M^ x 1] is defined as: 

PSNR(Y,Y) = 101ogio(^^^^^^^£^) 

liV-Y|ii 

**The S-CIELAB measure is a perceptual color fidelity measure that measures how accurate the reproduction of a color is to 
the original when viewed by a human observer [41]. In our experiments, we used the code with default parameters used in the 
implementation of this measure available at http://white.stanford.edu/~brian/scielab/scielab.html 
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Shift-And-Add 

LR Demosaiced [44] +SR [3] 

Only Lumin. 

Only Orient 

Only Chromin. . 

S-CIELAB 

1.532x10“ 

1.349x10“ 

7.892x10^° 

6.498x10^° 

4.648x10^° 

PSNR (dB) 

17.17 

19.12 

17.74 

20.10 

20.35 








SR [3] on Raw Data 

Lumin.+Orient. 

Orient.+Chrom. 

Lumin.+Chrom. 

Pull 

S-CIELAB 

5.456x10^° 

4.543 xl0i° 

4.382x10^° 

3.548 xl0i° 

3.365x10“ 

PSNR (dB) 

19.28 

20.79 

20.68 

21.12 

21.13 


TABLE I 

The quantitative comparison of the performance of different demosaicing methods on the lighthouse 
SEQUENCE. The proposed method has the lowest S-CIELAB error and the highest PSNR value. 


This set of frames was already demosaiced, and no information was available about the original sensor 
values, which makes the color enhancement task more difficult. This example may be also considered as a 
multi-frame color super-resolution case. The (unknown) camera PSF was assumed to be a 4 x 4 Gaussian 
kernel with standard deviation equal to one. As the relative motion between these images followed the 
translational model, we only needed to estimate the motion between the luminance components of these 
images [42]. We used the method described in [43] to computed the motion vectors. 

The Shift-And-Add result (resolution enhancement factor of 4) is shown in Figure 8.b (zoomed in 
Figure 8.f). In Figure 8.c (zoomed in Figure 8.g) the method of [3] is used for increasing the resolution 
by a factor of 4 in each color band, independently. And finally the result of applying our method on this 
sequence is shown in Figure 8.d (zoomed in Figure 8.h), where color artifacts are significantly reduced. 
The parameters used for this example are as follows: /3 = 0.004, a = 0.9, A' = 0.25, A ' = 500, A”' = 5. 

In the third experiment, we used 40 compressed images of a test pattern from a surveillance camera; 
courtesy of Adyoron Intelligent Systems Ltd., Tel Aviv, Israel. Figure 9.a shows one of these LR images (a 
selected region of this image is zoomed in Figure lO.a for closer examination). Note that the compression 
and color artifacts are quite apparent in these images. This set of frames was also already demosaiced, 
and no information was available about the original sensor values, which makes the color enhancement 
task more difficult. This example may be also considered as a multi-frame color super-resolution case. 
The (unknown) camera PSF was assumed to be a 6 x 6 Gaussian kernel with standard deviation equal 
to two. 

We used the method described in [43] to compute the motion vectors. The Shift-And-Add result 
(resolution enhancement factor of 4) is shown in Figure 9.b (zoomed in Figure lO.b). In Figure 9.c 
(zoomed in Figure lO.c) the method of [3] is used for increasing the resolution by a factor of 4 in each 
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color band, independently. And finally the result of applying the proposed method on this sequence is 
shown in Figure 9.d, (zoomed in Figure lO.d), where color artifacts are significantly reduced. Moreover, 
comparing to the Figures 9.a-d, the compression errors have been removed more effectively in Figures 
9.d. The parameters used for this example are as follows: (5 = 0.004, a = 0.9, A' = 0.25, A” = 500, 
A'" = 5. 

In the fourth, fifth, and sixth experiments (Girl, Bookcase, and Window sequences), we used 31 
uncompressed, raw CFA images (30 frames for the Window sequence) from a video camera (based on 
Zoran 2MP CMOS Sensors). We applied the method of [14] to demosaic each of these LR frames, 
individually. Figure 11.a (zoomed in Figure 12.a) shows one of these images from the Girl sequence 
(corresponding image of the Bookcase sequence is shown in Figure 13.a and the corresponding image 
of the Window sequence is shown in Figure 15.a). The result of the more sophisticated demosaicing 
method of [44] for Girl sequence is shown in Figure ll.b (zoomed in Figure 12.b). Figure 13.b shows 
the corresponding image for the Bookcase sequence and Figure 15.b shows the corresponding image for 
the Window sequence. 

To increase the spatial resolution by a factor of three, we applied the proposed multi-frame color 
super-resolution method on the demosaiced images of these two sequences. Figure ll.c shows the HR 
color super-resolution result from the LR color images of Girl sequence demosaiced by the method of 
[14] (zoomed in Figure 12.c). Figure 13.c shows the corresponding image for the Bookcase sequence and 
Figure 15.c shows the corresponding image for the Window sequence. Similarly, Figure ll.d shows the 
result of resolution enhancement of the LR color images from Girl sequence demosaiced by the method 
of [44] (zoomed in Figure 12.d). Figure 13.d shows the corresponding image for the Bookcase sequence 
and Figure 15.d shows the corresponding image for the Window sequence. 

Finally, we directly applied the proposed multi-frame demosaicing method on the raw CFA data to 
increase the spatial resolution by the same factor of three. Figure 11 .e shows the HR result of multi-frame 
demosaicing of the LR raw CFA images from Girl sequence without using the inter color dependence 
term [J 3 (X)] (zoomed in Figure 12.e). Figure 14.a shows the corresponding image for the Bookcase 
sequence and Figure 15.e shows the corresponding image for the Window sequence. Figure ll.f shows 
the HR result of applying the multi-frame demosaicing method using all proposed terms in (15) on the LR 
raw CFA images from Girl sequence (zoomed in Figure 12.f). Figure 14.b shows the corresponding image 
for the Bookcase sequence and Figure 15.f shows the corresponding image for the Window sequence. 

These experiments show that single frame demosaicing methods such as [44] (which in effect implement 
anti-aliasing filters) remove color artifacts at the expense of making the images more blurry. The proposed 
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color super-resolution algorithm can retrieve some high frequency information and further remove the 
color artifacts. However, applying the proposed multi-frame demosaicing method directly on raw CFA 
data produces the sharpest results and effectively removes color artifacts. These experiments also show 
the importance of the inter-color dependence term which further removes color artifacts. The parameters 
used for the experiments on Girl, Bookcase, and Window sequences are as follows: (5 = 0.002, a = 0.9, 
a' = 0.1, a” = 250, a ” = 25. The (unknown) camera PSF was assumed to he a tapered 5x5 disk PSF 
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VII. Discussion and Future Work 

In this paper, based on the MAP estimation framework, we proposed a unified method of demosaicing 
and super-resolution, which increases the spatial resolution and reduces the color artifacts of a set of low- 
quality color images. Using the Li norm for the data error term makes our method robust to errors in data 
and modelling. Bilateral regularization of the luminance term results in sharp reconstruction of edges, 
and the chrominance and inter-color dependencies cost functions remove the color artifacts from the HR 
estimate. All matrix-vector operations in the proposed method are implemented as simple image operators. 
As these operations are locally performed on pixel values on the HR grid, parallel processing may also 
be used to further increase the computational efficiency. The compufafional complexify of fhis mefhod is 
on fhe order of fhe compufafional complexify of fhe popular iferafive super-resolution algorifhms, such 
as [9]. Namely, if is linear in fhe number of pixels. 

The infer-color dependencies term (14) resulfs in fhe non-convexify of fhe overall penally function. 
Therefore, fhe sleepesl decenl optimization of (15) may reach a local ralher lhan fhe global minimum of 
fhe overall funclion. The non-convexify does nol impose a serious problem if a reasonable inilial guess 
is used for fhe sleepesl decenl mefhod, as many experimenls showed effective mulli-frame demosaicing 
resulfs. In our experimenls we noticed lhal a good inilial guess is fhe Shifl-And-Add resull of fhe 
individually demosaiced low-resolufion images. 

Accurale subpixel molion eslimalion is an essential part of any image fusion process such as mulli- 
frame super-resolulion or demosaicing. To fhe besl of our knowledge, no paper has addressed fhe problem 
of estimating molion befween Bayer filtered images. However, a few papers have addressed related issues. 
Ref. [42] has addressed fhe problem of color motion eslimalion, where informalion from differenl color 
channels are incorporaled by simply using allernalive color represenlalions such as HSV or normalized 

'^MATLAB command fspecial('disk’,2) creates such blumng kernel. 
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RGB. More work remains to be done to fully analyze subpixel motion estimation from colored filtered 
images. 
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Appendix I 

Derivation of the inter-color dependencies penalty term 


In this appendix, we illustrate the differentiation of the first term in (14), which we call L, with respect 
to Xr ^_. From (14) we have: 


T _ II v Y' Y ||2 © COmmUtativC _ II 

M — X.g\\2 -^ 




We can substitute the element by element multiplication operator “©”, with the differentiable dot product 
by rearranging 2Lb the diagonal matrix^'^ Xb and S^S^Xp as Xg™, which is the diagonal form of 
shifted Xg by I, m pixels in horizontal and vertical directions. 

r _ Il'V'f^ V "V" V l|2 /I o\ 

44 — II^B Ag “ AgII2 


Using the identity: 

dWQCWl d 
dC dC 

and noting that Xg"^ and Xb are symmetric matrices, the differentiation with respect to green band will 

be computed as follows: 

or 

_ ri/'v'fm Q—lQ—m-v- qI nm \ 

- 4(2Vg - 2VBj(2Lg Ji.G “ -^B^x^y Ag) 


(C'^Q'^QC) 


= 90^00' 


^®We are simply denoting a vector Q to its diagonal matrix representation Q such that: 


( 1 


^ qi 0 

0 ^ 

12 


0 g2 • •• 

0 

Q 


\ 0 0 • • • ?4r2M2 / 

Q 
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Differentiation of the second term in (14), and also differentiation with respect to the other color hands 
follow the same technique. 
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a: Original image 


b: LR image demosaiced by the method in [14] 



c: LR image demosaiced by the method in [44] d: Shift-And-Add image. 

Fig. 4. A HR image (a) is passed through our model of camera to produce a set of LR images. One of these LR images is 
demosaiced by the method in [14] (b). The same image is demosaiced by the method in [44] (c). Shift-And-Add on the 10 input 
LR images is shown in (d). 
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c: Reconst, with chrominance regul. d: Reconst, from LR demosaiced [44]+SR[3] 


Fig. 5. Multi-frame demosaicing of this set of LR frames with the help of only luminance, inter-color dependencies or 
chrominance regularization terms is shown in (a), (b), and (c), respectively. The result of applying the super-resolution method 
of [3] on the LR frames each demosaiced by the method [44] is shown in (d). 
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c: Reconst, with chrominance and inter-color regul. d: Reconst, from chrominance and luminance regul. 


Fig. 6. The result of super-resolving each color hand (raw data before demosaicing) separately considering only bilateral 
regularization [3], is shown in (a). Multi-frame demosaicing of this set of LR frames with the help of only inter-color 
dependencies-luminance, inter-color dependencies-chrominance, and luminance-chrominance regularization terms is shown in 
(b), (c), and (d), respectively. 
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a: Reconst, with all terms. 


Fig. 7. The result of applying the proposed method (using all regularization terms) to this data set is shown in (a). 
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Fig. 8. Multi-frame color super-resolution implemented on a real data sequence, (a) shows one of the input LR images and (b) is 
the Shift-And-Add result increasing resolution by a factor of 4 in each direction, (c) is the result of the individual implementation 
of the super-resolution [3] on each color band, (d) is implementation of (15) which has increased the spatial resolution, removed 
the compression artifacts, and also reduced the color artifacts. Figures (e), (f), (g), and (h) are the zoomed images of the Figures 
(a), (b), (c), and (d) respectively. 


















33 



c: SR [3] on LR frames 


d: Proposed method 


Fig. 9. Multi-frame color super-resolution implemented on a real data sequence, (a) shows one of the input LR images and (b) is 
the Shift-And-Add result increasing resolution by a factor of 4 in each direction, (c) is the result of the individual implementation 
of the super-resolution [3] on each color band, (d) is implementation of (15) which has increased the spatial resolution, removed 
the compression artifacts, and also reduced the color artifacts. These images are zoomed in Figure 10. 
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c: SR [3] on LR frames d: Proposed method 


Fig. 10. Multi-frame color super-resolution implemented on a real data sequence. A selected section of Figure 9(a), 9(b), 9(c), 
and 9(d) are zoomed in Figure 10(a), 10(b), 10(c), and 10(d), respectively. In (d) almost all color artifacts that are present on 
the edge areas of (a), (b), and (c) are effectively removed. 
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Fig. 11. Multi-frame color super-resolution implemented on a real data sequence, (a) shows one of the input LR images 
demosaiced by [14] and (b) is one of the input LR images demosaiced by the more sophisticated [44]. (c) is the result of 
applying the proposed color-super-resolution method on 31 LR images each demosaiced by [14] method, (d) is the result of 
applying the proposed color-super-resolution method on 31 LR images each demosaiced by [44] method. The result of applying 
our method on the original un-demosaiced raw LR images (without using the inter color dependence term) is shown in (e). (f) 
is the result of applying our method on the original un-demosaiced raw LR images. 

















36 



Fig. 12. Multi-frame color super-resolution implemented on a real data sequence (zoomed), (a) shows one of the input LR 
images demosaiced by [14] and (b) is one of the input LR images demosaiced by the more sophisticated [44]. (c) is the result 
of applying the proposed color-super-resolution method on 31 LR images each demosaiced by [14] method, (d) is the result of 
applying the proposed color-super-resolution method on 31 LR images each demosaiced by [44] method. The result of applying 
our method on the original un-demosaiced raw LR images (without using the inter color dependence term) is shown in (e). (f) 
is the result of applying our method on the original un-demosaiced raw LR images. 
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Fig. 13. Multi-frame color super-resolution implemented on a real data sequence, (a) shows one of the input LR images 
demosaiced by [14] and (b) is one of the input LR images demosaiced by the more sophisticated [44]. (c) is the result of 
applying the proposed color-super-resolution method on 31 LR images each demosaiced by [14] method, (d) is the result of 
applying the proposed color-super-resolution method on 31 LR images each demosaiced by [44] method. 
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Fig. 14. Multi-frame color super-resolution implemented on a real data sequence. The result of applying our method on the 
original un-demosaiced raw LR images (without using the inter color dependence term) is shown in (a), (b) is the result of 
applying our method on the original un-demosaiced raw LR images. 
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Fig. 15. Multi-frame color super-resolution implemented on a real data sequence, (a) shows one of the input LR images 
demosaiced by [14] and (b) is one of the input LR images demosaiced by the more sophisticated [44]. (c) is the result of 
applying the proposed color-super-resolution method on 30 LR images each demosaiced by [14] method, (d) is the result of 
applying the proposed color-super-resolution method on 30 LR images each demosaiced by [44] method. The result of applying 
our method on the original un-demosaiced raw LR images (without using the inter color dependence term) is shown in (e). (f) 
is the result of applying our method on the original un-demosaiced raw LR images. 









